Module 6: Mixing Categorical and Continuous Predictors

PSYC 3032 M

Udi Alter

Things Udi Should Remember



Before we start

  • Group presentation is due in two weeks (March 25)
    • Post videos on eClass
    • More instructions are now available
  • Assignment 1 grades are available, wonderful job, everybody!
  • The only evaluations remain are:
    • Group presentation
    • Lab 5 (easy peasy lemon squeezy)
    • A2

About Module 6

Goals for Today:

  • ANCOVA
    • Regression with a mix of categorical and continuous predictors
    • Parallelism: the assumption of homogeneity of slopes
  • Reviewing A1


What’s the first topping on your ideal pizza?



Do you have a lucky number? What is it?



Chocolate or vanilla?


A) Vanilla, duh!

B) Are you kidding me? Chocolate, what else?!

C) I like a mix (like ANCOVA!)




Mixing Categorical and Continuous Predictors

Working Example

  • Last week, we discussed an example where researchers (Baumann et al.) sought to determine how children’s reading comprehension scores after an intervention (i.e., posttest scores) differed by treatment group (control, DRTA, or TA)

Categorical Predictor Variables

  • We saw how we could use dummy coding to evaluate whether type of intervention is a meaningful explanatory variable of posttest scores
    • Where \(D1\) represents the mean difference between DRTA and control groups, and
    • \(D2\) represents the mean difference between TA and control groups
  • D1 and D2 then become the predictors in a multiple regression model (instead of the original grouping variable, e.g., group):

\[\hat{Reading \ score}_i = {\color{deeppink} {\beta_0}} + {\color{darkcyan} {\beta_1}}{\color{darkgrey} {D1_i}} + {\color{gold} {\beta_2}}{\color{lightblue} {D2_i}}\]

\[\hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {D1_i}} + {\color{gold} {1.09}}{\color{lightblue} {D2_i}}\]

\[\hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {(DRTA \ vs. \ Control)_i}} + {\color{gold} {1.09}}{\color{lightblue} {(TA \ vs. \ Control)_i}}\]

Regrssion with a Categorical Predictor is a One-Way ANOVA

\[\hat{Reading \ score}_i = {\color{deeppink} {\beta_0}} + {\color{darkcyan} {\beta_1}}{\color{darkgrey} {D1_i}} + {\color{gold} {\beta_2}}{\color{lightblue} {D2_i}} = \\ \hat{Reading \ score}_i = {\color{deeppink} {6.68}} + {\color{darkcyan} {3.09}}{\color{darkgrey} {D1_i}} + {\color{gold} {1.09}}{\color{lightblue} {D2_i}}\]

Say we wanted to find the reading comprehension mean of DRTA, how can we do it?


A) \(\beta_1 + \beta_2\)

B) \(D1+D2\)

C) \(\beta_0 + \beta_1\)

D) \(\beta_0 + \beta_2\)

E) “Jesus, take the wheel!”




What about including other predictors beyond a single categorical variable?

Adding More Predictors

  • Given that the dummy coding approach discussed last lecture are analogous to a one-way ANOVA, we could think of an MLR model with at least one categorical predictor and at least one continuous predictor as an ANCOVA (Analysis of Covariance)!

  • So, an ANCOVA model is just a certain type of multiple regression model that includes both categorical and continuous predictors

  • Typically, ANCOVA is used for comparing group means on an outcome variable while controlling for some continuous variable

  • It’s common for researchers to use the word “covariate” when referring to the continuous variable in ANCOVA
    • What they often mean by that is a variable that they cannot manipulate and has little substantive or theoretical interest
    • But, as we learned, added variables in the model can be anything we want to condition on or control for statistically (e.g., confounders, forks, colliders)
    • i.e., you simply want to partial out the effects of the added variable in the analysis

Nice Meeting You, Ann Cova!

  • If ANOVA/regression with a categorical variable is commonly presented as a method for comparing group means, ANCOVA is often presented as a method for comparing adjusted means across groups (AKA conditional means)

  • The interpretation of the slope associated with the categorical predictor will change slightly, but you’re already familiar with this change!

  • This will not be different than the difference between any other predictor in SLR when you move to MLR

QUICK EXAMPLE

  • Say we estimate a model where life satisfaction is regressed on different types of meditation interventions (e.g., control, mindfulness, mantra-based, and gratitude-based), conditioning on socio-economic status (SES)

  • Then, we can interpret the dummy-code representing one of the comparisons between mediation types as the mean difference in life satisfaction between, say, mindfulness and control, adjusted for SES (or holding SES constant)

One More Thing, Ann

  • But, there’s this other thing…

  • ANCOVA has an additional critical assumption which is known as homogeneity of regression, or parallelism

  • Parallelism means that the relationship between the continuous variable, \(X\), and the outcome, \(Y\), is assumed to be constant, or homogeneous, across the levels of the categorical predictor

  • Put another way, the traditional ANCOVA model assumes that there is no interaction between \(X\) and the categorical variable

  • But, if we reframe ANCOVA as multiple regression, it’s easy to relax this assumption by including an interactions term between \(X\) and the dummy-coded variables representing group membership (we will address interactions in Module 7)

  • Another way to think about the parallelism assumption from a regression framework is the assumption that your model is properly specified

    • i.e., you didn’t “miss” an interaction that exist in the population/data-generating mechanism

Parallelism Assumption

ANCOVA Example

  • Yes, the reading comprehension example, again!

  • Baumann et al. (1992) were actually interested in how the groups differed in their post-intervention reading test scores over and above any differences on a reading test score administered before the intervention

  • This is an example of a classic “Pre-Post” research question for which ANCOVA is often applied:

    • “How do treatment groups differ on posttest scores, controlling pretest scores?” Or,
    • “Would the groups differ on the posttest if they had been equivalent on the pretest?”
  • Our regression/ANCOVA model will provide estimates of the adjusted mean differences, controlling for pre-test score

Baumann et al. (1992) were actually interested in how the groups differed in their post-intervention reading test scores over and above any differences on a reading test score administered before the intervention


How can we evaluate their research question?


A) Hierarchical regression

B) WLS

C) Robust regression

D) There’s always room for pud

E) Multilevel modeling

ANCOVA Example

  • Last week, we examined differences between post-test scores, but now we will add pre-test score as a model covariate
    • We, therefore, would specify the multiple regression model:

\[\hat{Post}_i=\beta_0 + \beta_1D1_i + \beta_2D2_i + \beta_3Pre_i\]

  • Using this ANCOVA approach, the omnibus, overall effect of the intervention variable is the joint effect of D1 and D2, taken together

  • To obtain this joint effect and its statistical significance, we can follow a hierarchical regression procedure

ANCOVA Example

  • Specifically:

\(\text{Model 1}: \hat{Post}_i= {\color{deeppink} {\beta_0 + \beta_1Pre_i}}\) vs.

\(\text{Model 2 (ANCOVA)}: \hat{Post}_i= {\color{deeppink} {\beta_0 + \beta_1Pre_i}} + \beta_2D1_i + \beta_3D2_i\)


mod1 <- lm(posttest1 ~ pretest1, data = read)
mod2 <- lm(posttest1 ~ pretest1 + as_factor(group), data = read)

ANCOVA Example

summary(mod1)$r.squared # Nested model R2
[1] 0.3202457
summary(mod2)$r.squared # Full model R2
[1] 0.5118617
summary(mod2)$r.squared-summary(mod1)$r.squared # Delta R2
[1] 0.191616

ANCOVA Example

And, for the actual model comparison, the F test:


anova(mod1, mod2)
Analysis of Variance Table

Model 1: posttest1 ~ pretest1
Model 2: posttest1 ~ pretest1 + as_factor(group)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     64 508.88                                  
2     62 365.43  2    143.45 12.169 3.483e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANCOVA Example

  • This \(\Delta R^2 = .19\), almost 20% increase in explained variability in reading comprehension, a notable proportion indeed. This change in \(R^2\) is statistically significant, F (2, 62) = 12.17, p < .001, indicating that the reading intervention is both practically and statistically related to the post-test reading evaluation, over and above the pre-intervention reading score



  • The total proportion of variance in in outcome explained by the linear combination of the explanatory variables is .51, suggesting that 51.2% of of the entire variability in reading comprehension scores post-intervention is explained by intervention type and pre-test reading score; that is, more than half(!) of the variability in the reading comprehension scores post-intervention is accounted for by this set of variables, a truly substantial amount.

ANCOVA Example

  • Next, the estimated regression coefficients for D1 and D2 and their statistical significance give results for the specific, planned comparisons among the three treatment groups:


summary(mod2)$coefficients
                       Estimate Std. Error    t value     Pr(>|t|)
(Intercept)          -0.5966478  1.1845062 -0.5037101 6.162502e-01
pretest1              0.6931872  0.1014697  6.8314735 4.205253e-09
as_factor(group)DRTA  3.6265538  0.7361861  4.9261371 6.553543e-06
as_factor(group)TA    2.0361644  0.7449616  2.7332475 8.161831e-03
confint(mod2)
                          2.5 %    97.5 %
(Intercept)          -2.9644420 1.7711465
pretest1              0.4903523 0.8960222
as_factor(group)DRTA  2.1549387 5.0981689
as_factor(group)TA    0.5470074 3.5253214

ANCOVA Example

  • Pertaining to the effect of D1, the DRTA group had significantly higher post-test scores than the control group after adjusting for pre-intervention scores, \(\hat{\beta}_2= 3.64\), 95% CI [2.15, 5.10], t (62) = 4.92, p < .001; that is, partialling out the effect of pre-intervention reading ability, a random kid from the DRTA group is expected to score about 3.64 points higher than a kid in the control condition.
    • Given a 15-point score range in post-intervention, I consider a 3.5-point difference—which is roughly 24%—a small-to-medium effect, likely with light, yet non-negligible implications to improving reading comprehension.


  • Similarly, D2, the TA group had significantly higher post-test scores than the control group holding pre-test scores constant, \(\hat{\beta}_3= 2.04\), 95% CI [0.55, 3.52], t (62) = 2.73, p = 0.008; after accounting for the initial reading skills before the intervention, the mean difference between the TA and control groups is estimated at about 2 points (roughly 13.6% of the scare range), suggesting an even smaller difference than the DRTA group, but still may carry some potential real-world implications.




What about Using Change Scores Instead of Conditioning on Pretest?

Change Scores

  • Another approach with pre-post designs, is to model the difference between post-test and pre-test scores such that the outcome variable is the difference betwe post and pre rather than posttest and control for pretest:

    • \(Y_{change} = Y_{post_i} – Y_{pre_i}\)
    • These are called change scores, difference scores, or gain scores

The advantages of using change scores are:

  • Logical – you directly model the amount of change from pre to post test
  • Unbiased when there are real group differences at pre-test (e.g., non-randomized experiments or important differences that shouldn’t be ignored)

Limitations of change scores include:

  • Slightly less powerful when random assignment is used
  • More likely to have floor or ceiling effects, which creates a lot of measurement error (i.e., unreliability)

Change Scores vs. ANCOVA

  • Although ANCOVA and change scores represent different approaches to handling the same research design, researchers may actually arrive at different conclusions if they were to use both approaches on the same dataset; this phenomenon is called Lord’s Paradox (Lord, 1967)

  • Lord’s original experiment was meant to evaluate how young men and women differ on weight change over the course of a semester

  • But, men obviously start at a much higher average weight than women

  • In Lord’s dataset, men and women do not change at all over time (mean change = ~0 in both groups). Here’s a simulated illustration…

Lord’s Paradox


Lord’s Paradox

summary(lm(change ~ gender, dat))   # Change Scores (t-test)

Call:
lm(formula = change ~ gender, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5233  -3.3977   0.6239   3.9303  13.8364 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.1692     0.5190   2.253   0.0254 *
genderMen     0.1209     0.7340   0.165   0.8694  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.19 on 198 degrees of freedom
Multiple R-squared:  0.0001369, Adjusted R-squared:  -0.004913 
F-statistic: 0.02711 on 1 and 198 DF,  p-value: 0.8694

Lord’s Paradox

summary(lm(final ~ gender + initial, dat)) # ANCOVA 

Call:
lm(formula = final ~ gender + initial, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4746 -1.9902  0.1751  1.8293  8.0446 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.79411    1.67270   20.20   <2e-16 ***
genderMen   13.42333    0.79429   16.90   <2e-16 ***
initial      0.44538    0.02797   15.92   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.006 on 197 degrees of freedom
Multiple R-squared:  0.9463,    Adjusted R-squared:  0.9457 
F-statistic:  1734 on 2 and 197 DF,  p-value: < 2.2e-16

Lord’s Paradox

Results

  • Using t test on change score, we do not conclude a difference in between men and women

  • Using ANCOVA, we conclude a difference between men and women in final weight controlling for initial weight

  • So, which approach is correct?! Both approaches are correct!

The two approaches actually answer different research questions:

  • Change scores addresses the question, “What is the difference in weight change between men and women?”

  • ANCOVA answers, “Is there still a difference between men and women in their final weights, after accounting for where each individual started (initial weight)?”



  • Here’s a wonderful “tutorial” by Michael Clark showing more on Lord’s Paradox

Lord’s original experiment was meant to evaluate how young men and women differ on weight change over the course of a semester


Which do you think is more approapriate here?


A) t test on change scores

B) ANCOVA

C) Lisa Simpson’s paradox

D) HTML

E) Haven’t you done well




ANCOVA Assumptions

The assumptions of OLS regression apply equivalently to models with discrete predictors


What can we assume?


A) The linearity assumption is satisfied for the dummy codes

B) Homoscedasticity is the next step in human evolution (follwing homosapians)

C) Multicollinearity will never be violated

D) We don’t care about normality

E) Influential cases are relevant only when photosynthesis occurs

ACOVA Assumptions

  • Here’s how that looks
library(car)
mod2 <- lm(posttest1 ~ pretest1 + as_factor(group), data = read)
stud_resid <- rstudent(mod2) # Studentizing the model residuals
scatterplot(stud_resid ~ read$group, boxplot=FALSE)

  • See? I told you, didn’t I?

ACOVA Assumptions

  • The assumptions of OLS regression apply equivalently to models with discrete predictors and the same diagnostic procedures presented in earlier modules can be used

  • Recall, LINE:

    • Linear relationship (applies to covariate and outcome only)
    • Independence
    • Normally distributed residuals
    • Equal variance (homoscedasticity)
  • And, of course:

    • Multicollinearity
    • Influential cases/outliers


…But, ANCOVA has one more assumption, remember?


What’s the additional ANCOVA assumption?


A) Do no harm

B) homogeneity of regression/parallelism

C) Never judge a model by its sum of residuals

D) Confounders are never to be found(ers)

E) The continuous and categorical varibles must be correlated

Testing for Parallelism

  • We can add an interaction term between group and pre-test and test whether it fits the data better!

\(\hat{Post}_i= {\color{deeppink} {\beta_0}} + {\color{deeppink} {\beta_1D1_i}} + {\color{deeppink} {\beta_2D2_i}} + {\color{deeppink} {\beta_3Pre_i}}\) vs. \(\hat{Post}_i={\color{deeppink} {\beta_0}} + {\color{deeppink} {\beta_1D1_i}} + {\color{deeppink} {\beta_2D2_i}} + {\color{deeppink} {\beta_3Pre_i}} + \beta_4D1_i \times Pre_i + \beta_5D2_i \times Pre_i\)

  • Again, let’s do some hierarchical regression!
read$group <- haven::as_factor(read$group) # Ensure the group variable is treated as a factor

interaction_mod <- lm(posttest1 ~ pretest1 * group, data = read) # By using * we automatically add all the terms, beta1 through beta5 in the model above!

no_int_mod <- lm(posttest1 ~ pretest1 + group, data = read) # the same as mod2 from earlier

Testing for Parallelism

summary(interaction_mod)$r.squared
[1] 0.5351859
summary(no_int_mod)$r.squared
[1] 0.5118617

Testing for Parallelism

summary(interaction_mod)$r.squared-summary(no_int_mod)$r.squared # Delta R2
[1] 0.0233242
anova(interaction_mod, no_int_mod) # F test
Analysis of Variance Table

Model 1: posttest1 ~ pretest1 * group
Model 2: posttest1 ~ pretest1 + group
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     60 347.97                           
2     62 365.43 -2   -17.461 1.5054 0.2302

Testing for Parallelism

  • We can should also plot it…
ggplot(read, aes(x = pretest1, y = posttest1, color = group)) +
  geom_smooth(method = "lm", se = TRUE, aes(fill = group), alpha = 0.25) +  # Add linear regression lines with semi-transparent confidence bands
  geom_point(size = 2, alpha = 0.6) +  # Plot the points with slight transparency
  labs(x = "Pretest Score", y = "Posttest Score",  title="Regression Slopes by Group") +
  theme(legend.position = "none") +
  theme_classic()